<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META NAME="generator" CONTENT="http://txt2tags.org">
<TITLE>BNF Converter Tutorial</TITLE>
</HEAD><BODY BGCOLOR="white" TEXT="black">
<CENTER>
<H1>BNF Converter Tutorial</H1>
<FONT SIZE="4"><I>Aarne Ranta</I></FONT><BR>
</CENTER>

<P>
<!-- NEW -->
</P>

<H2>BNFC = BNF Converter</H2>

<P>
BNF = Backus-Naur Form (also known as Context-Free Grammars).
</P>
<P>
BNF is the standard format for the
<B>specification</B> and <B>documentation</B> of programming languages.
</P>
<P>
BNFC makes BNF usable for <B>implementation</B> as well.
</P>
<P>
BNFC is a high-level front end to traditional implementation formats
(in the style of Lex and YACC):
<I>"BNFC is a compiler compiler compiler"</I>
</P>
<P>
BNFC saves 90% of source code writing in a typical compiler front end.
</P>
<P>
BNFC can be used for projects carried out in C, C++, C#, F#, Haskell, Java, OCaml.
</P>
<P>
<!-- NEW -->
</P>

<H2>What BNFC can be used for</H2>

<P>
Strongest case: when designing and implementing a new programming language
</P>

<UL>
<LI>rapid development
<LI>guaranteed consistency of components
<LI>automatic documentation (in latex and html)
<LI>freedom to change implementation language
</UL>

<P>
BNFC also scales up to legacy programming languages
</P>

<UL>
<LI>Ansi C and Java have been implemented
<LI>but some languages have rough corners (e.g. Haskell)
</UL>

<P>
<!-- NEW -->
</P>

<H2>How to obtain BNFC</H2>

<P>
BNFC homepage:
<A HREF="http://digitalgrammars.com/bnfc/"><CODE>digitalgrammars.com/bnfc/</CODE></A>
</P>
<P>
You can obtain
</P>

<UL>
<LI>a Windows binary
<LI>a Linux (i386) binary
<LI>a Mac OS X (i386) binary
<LI>a source package
</UL>

<P>
If you are using Debian or Ubuntu Linux, you can obtain BNFC with their
package system (but it is a slightly older version).
</P>
<P>
For the binaries, it is enough to download them and put into a place where you
can find executables (such as <CODE>/usr/local/bin</CODE> on Unix-like platforms).
</P>
<P>
<!-- NEW -->
</P>

<H2>Working with BNFC sources</H2>

<P>
If you choose the source package, you need the
<A HREF="http://www.haskell.org/ghc/">GHC Haskell Compiler</A>. With GHC in place,
just unpack the sources, <CODE>cd</CODE> to <CODE>BNFC</CODE>, and type <CODE>make</CODE>.
</P>
<P>
If you want to contribute to BNFC (more than 12 persons have!), you
can fork the <A HREF="https://github.com/BNFC/bnfc">BNFC repository</A> on github.com
and submit pull requests.
</P>
<P>
BNFC is licensed under GPL.
</P>
<P>
<!-- NEW -->
</P>

<H2>Calling BNFC</H2>

<P>
When you have BNFC in place (i.e. on your path), you can call it via
</P>

<PRE>
    bnfc
</PRE>

<P>
This gives you a list of available options. The most common choices are:
</P>

<PRE>
    bnfc -m -c       FILE.cf      # to generate C
    bnfc -m -cpp_stl FILE.cf      # to generate C++ with STL
    bnfc -m -java1.5 FILE.cf      # to generate Java 1.5
    bnfc -m -haskell FILE.cf      # to generate Haskell
</PRE>

<P>
The <CODE>-m</CODE> flag makes BNFC to generate a <CODE>Makefile</CODE>. This means
that, after running <CODE>bnfc</CODE>, you can create an executable parser by
</P>

<PRE>
    make
</PRE>

<P>
Let us now create our first application from a BNFC source file.
</P>
<P>
<!-- NEW -->
</P>

<H2>My first compiler: calculator</H2>

<P>
We start with everyone's favourite: the desktop calculator.
</P>
<P>
To make it the simplest possible, we restrict ourselves to integers, with
addition, subtraction, multiplication, and (lossy) division.
</P>
<P>
The input language is defined with the following BNFC grammar.
</P>

<PRE>
    EAdd. Exp  ::= Exp  "+" Exp1 ;
    ESub. Exp  ::= Exp  "-" Exp1 ;
    EMul. Exp1 ::= Exp1 "*" Exp2 ;
    EDiv. Exp1 ::= Exp1 "/" Exp2 ;
    EInt. Exp2 ::= Integer ;
  
    coercions Exp 2 ;
</PRE>

<P>
Copy this code into a file called <CODE>Calc.cf</CODE>.
</P>
<P>
We return to the details of the notation after trying this out in BNFC.
</P>
<P>
<!-- NEW -->
</P>

<H2>Compiling the compiler in Java</H2>

<P>
If you want to work in Haskell, C, or C++, skip a few sections now.
</P>
<P>
Assuming you want to work in Java, do the following:
</P>

<PRE>
    bnfc -m -java1.5 Calc.cf
    make
</PRE>

<P>
If everything goes fine, this will create a parser test class, which you
can try out in the following way:
</P>

<PRE>
    echo "23 + 4 * 70" | java Calc/Test
  
    Parse Succesful!
    [Abstract Syntax]
    (EAdd (EInt 23) (EMul (EInt 4) (EInt 70)))
    [Linearized Tree]
    23 + 4 * 70
</PRE>

<P>
However, this will probably not work at first time: you will have to
install some more software and set a classpath.
</P>
<P>
<!-- NEW -->
</P>

<H2>Putting parser and lexer tools for Java in place</H2>

<P>
When you called BNFC, you saw it generate lots of files. Most of them were
<CODE>.java</CODE> files, but there were two special ones:
</P>

<UL>
<LI><CODE>Calc/Yylex</CODE>: lexer specification file in the JLex format
<LI><CODE>Calc/Calc.cup</CODE>: parser specification file in the Cup format
</UL>

<P>
You will need to download and install both of the required tools:
</P>

<UL>
<LI><A HREF="http://www.cs.princeton.edu/~appel/modern/java/CUP/">Cup</A>, version 0.10k
<LI><A HREF="http://www.cs.princeton.edu/~appel/modern/java/JLex/">JLex</A>, version 1.2.6
</UL>

<P>
It is safest to take the named versions,
even if there were newer ones available.
</P>
<P>
In addition to these, you of course need the Java compiler and runtime
environment available from <A HREF="http://java.sun.com">Sun</A>.
</P>
<P>
Cup comes as a package of java and class files. Just unpack it in some
directory, e.g. <CODE>/usr/local/java/Cup</CODE>.
</P>
<P>
JLex is just one Java file. Put it in some directory,
e.g. <CODE>/usr/local/java/JLex</CODE>. Compile it with <CODE>javac Main.java</CODE> in that
directory.
</P>
<P>
<!-- NEW -->
</P>

<H2>Setting the Java class path</H2>

<P>
After installing Cup and JLex, compiling with
</P>

<PRE>
    bnfc -m -java1.5 Calc.cf
    make
</PRE>

<P>
will probably still fail, with a message saying that a class was not found.
Fix this by
</P>

<PRE>
    export CLASSPATH=.:/usr/local/java/Cup:/usr/local/java
</PRE>

<P>
provided you put Cup and JLex in places as specified above. Notice also the
inclusion of the current working directory (<CODE>.</CODE>).
</P>
<P>
If <CODE>export</CODE> doesn't work: in some Unix shells,
the <CODE>CLASSPATH</CODE> variable is set with the command
</P>

<PRE>
    setenv CLASSPATH .:/usr/local/java/Cup:/usr/local/java
</PRE>

<P>
<I>Now</I> you will hopefully be able to compile and run the compiler.
</P>
<P>
<!-- NEW -->
</P>

<H2>Compiling the compiler in Haskell</H2>

<P>
Assuming you want to work in Haskell, do the following:
</P>

<PRE>
    bnfc -m -haskell Calc.cf
    make
</PRE>

<P>
If everything goes fine, this will create a parser test executable, which you
can try out in the following way:
</P>

<PRE>
    echo "23 + 4 * 70" | ./TestCalc
  
    Parse Successful!
    [Abstract Syntax]
    EAdd (EInt 23) (EMul (EInt 4) (EInt 70))
    [Linearized tree]
    23 + 4 * 70
</PRE>

<P>
The tools that you need to have installed are
</P>

<UL>
<LI><A HREF="http://www.haskell.org/ghc/">GHC</A>, the Glasgow Haskell compiler
<LI><A HREF="http://www.haskell.org/alex/">Alex</A>, a lexer generator for Haskell
<LI><A HREF="http://www.haskell.org/happy/">Happy</A>, a parser generator for Haskell
</UL>

<P>
<!-- NEW -->
</P>

<H2>Compiling the compiler in C and C++</H2>

<P>
Assuming you want to work in C or C++, do the following:
</P>

<PRE>
    bnfc -m -c Calc.cf           # in C
    bnfc -m -cpp_stl Calc.cf     # in C++
    make
</PRE>

<P>
If everything goes fine, this will create a parser test executable, which you
can try out in the following way:
</P>

<PRE>
    echo "23 + 4 * 70" | ./TestCalc
  
    Parse Successful!
    [Abstract Syntax]
    EAdd (EInt 23) (EMul (EInt 4) (EInt 70))
    [Linearized tree]
    23 + 4 * 70
</PRE>

<P>
The tools that you need to have installed are
</P>

<UL>
<LI><A HREF="http://gcc.gnu.org/">GCC</A> compiler for C and C++
<LI><A HREF="http://www.gnu.org/software/bison/">Bison</A> parser generator for C and C++,
  version 1.875
<LI><A HREF="http://www.gnu.org/software/flex/">Flex</A> lexer generator for C and C++,
  version 2.5.4
</UL>

<P>
The most common source of problems is wrong version of Bison. If the test
program always results in <CODE>error: parse error</CODE>, check the version with
<CODE>bison --version</CODE>.
</P>
<P>
<!-- NEW -->
</P>

<H2>Content of a BNFC file</H2>

<P>
A BNFC source file is a sequence of <B>rules</B>, where most rules have the
format
</P>

<PRE>
    LABEL . VALUE_CATEGORY ::= PRODUCTION ;
</PRE>

<P>
The <CODE>LABEL</CODE> and <CODE>VALUE_CATEGORY</CODE> are <B>identifiers</B> (without quotes).
</P>
<P>
The <CODE>PRODUCTION</CODE> is a sequence of
</P>

<UL>
<LI>identifiers, called <B>nonterminals</B>, and/or
<LI><B>string literals</B> (with quotes), called <B>terminals</B>.
</UL>

<P>
The rule has the following semantics:
</P>
	<BLOCKQUOTE>
	A <B>tree</B> of type <CODE>VALUE_CATEGORY</CODE> can be built with <CODE>LABEL</CODE> as the
	topmost node, from any sequence specified by the production,
	whose nonterminals give the subtrees of this new tree.
	</BLOCKQUOTE>
<P>
Types of trees are the <B>categories</B> of the grammar. Tree labels are
the <B>constructors</B> of those categories.
</P>
<P>
<!-- NEW -->
</P>

<H2>Common problems with identifiers</H2>

<P>
All categories and constructors should
</P>

<UL>
<LI>begin with a capital letter,
<LI>contain only ASCII letters, digits, and underscores (<CODE>_</CODE>), and
<LI>be distinct from each other.
</UL>

<P>
These three conditions guarantee that the grammar
will work on <I>all</I> platforms.
</P>
<P>
<!-- NEW -->
</P>

<H2>Example of a tree</H2>

<P>
In the examples above, the string
</P>

<PRE>
    23 + 4 * 70
</PRE>

<P>
was compiled into a tree displayed as follows:
</P>

<PRE>
    EAdd (EInt 23) (EMul (EInt 4) (EInt 70))
</PRE>

<P>
This is just a handy (machine-readable!) notation for the "real" tree
</P>
<P>
  <IMG ALIGN="right" SRC="tuttree.png" BORDER="0" ALT="">
</P>
<P>
(You may also notice that it is <I>exactly</I> the notation Haskell programmers
use for specifying trees.)
</P>
<P>
<!-- NEW -->
</P>

<H2>Precedence levels</H2>

<P>
How does BNFC know that addition is above multiplication? I.e., why isn't the tree
</P>

<PRE>
    EMul (EAdd (EInt 23) (EInt 4)) (EInt 70)
</PRE>

<P>
This is due to the fact that multiplication expressions are given
<B>higher precedence</B>.
</P>
<P>
The nonterminal <CODE>Exp</CODE> has <B>precedence level</B> 0 (actually, we could write
<CODE>Exp0</CODE> to mean the same), <CODE>Exp1</CODE> has precedence level 1, etc.
</P>
<P>
The rule
</P>

<PRE>
    EAdd. Exp  ::= Exp  "+" Exp1 ;
</PRE>

<P>
says: <CODE>EAdd</CODE> forms an expression of level 0 from an expression of level 0
on the left of <CODE>+</CODE> and of level 1 on the right.
</P>
<P>
Likewise, the rule
</P>

<PRE>
    EMul. Exp1 ::= Exp1 "*" Exp2 ;
</PRE>

<P>
says: <CODE>EMul</CODE> form an expression of level 1 from an expression of level 1
on the left of <CODE>*</CODE> and of level 2 on the right.
</P>
<P>
<!-- NEW -->
</P>

<H2>Semantics of precedence levels</H2>

<P>
An expression of higher level can always be used on lower levels as well.
</P>

<UL>
<LI><CODE>2 + 3</CODE> is fine: level 2 (<CODE>EInt</CODE>) is used on levels 0 and 1, respectively.
</UL>

<P>
An expression of any level can be lifted to the highest level by putting
it in parentheses.
</P>

<UL>
<LI><CODE>(5 + 6)</CODE> is an expression of level 2.
</UL>

<P>
The rule <CODE>coercions Exp 2</CODE> says that 2 is the highest level for <CODE>Exp</CODE>.
</P>
<P>
All precedence variants of a nonterminal denote the same type.
</P>

<UL>
<LI><CODE>2</CODE>, <CODE>2 + 2</CODE>, and <CODE>2 * 2</CODE> are of the same type, <CODE>Exp</CODE>.
</UL>

<P>
This means that a type-correct tree remains type-correct, if a subtree
of category <CODE>Exp</CODE><I>i</I> is changed into a subtree of <CODE>Exp</CODE><I>j</I>.
</P>
<P>
<!-- NEW -->
</P>

<H2>The deeper semantics of precedence levels: dummy labels</H2>

<P>
BNFC permits a <B>dummy label</B>, which does not construct a new tree but
just returns the old one (which must be of same type):
</P>

<PRE>
    _. Exp2 ::= "(" Exp ")" ;
</PRE>

<P>
The rule (<CODE>coercions Exp 2</CODE>) is a shorthand for a group of dummy rules:
</P>

<PRE>
    _. Exp  ::= Exp1 ;
    _. Exp1 ::= Exp2 ;
    _. Exp2 ::= "(" Exp ")" ;
</PRE>

<P>
<!-- NEW -->
</P>

<H2>Compiler components</H2>

<P>
BNFC generates the following components:
</P>

<UL>
<LI>lexer: the JLex/Alex/Flex file
<LI>parser: the Cup/Happy/Bison file
<LI>abstract syntax: a bunch of Java/Haskell/C/C++ files
<LI>pretty printer: a Java/Haskell/C/C++ file
<LI>back end skeleton: a Java/Haskell/C/C++ file
<LI>grammar document: a Latex file
</UL>

<P>
The first three belong to a <B>compiler front end</B>, analysing the source code.
</P>
<P>
The <B>compiler back end</B> can either
</P>

<UL>
<LI>generate some target code (compiler), or
<LI>run the source code tree directly (interpreter).
</UL>

<P>
<!-- NEW -->
</P>

<H2>Abstract syntax</H2>

<P>
The hub of a modern compiler:
</P>

<UL>
<LI>target of the front end, and
<LI>starting point of the back end.
</UL>

<P>
In the middle of the front end and back end, there is manipulation of
abstract syntax, such as type checking and optimizations.
</P>
<P>
Abstract syntax representations in programming languages (as generated by BNFC):
</P>

<UL>
<LI>Haskell: algebraic datatypes
<LI>Java and C++: classes and subclasses
<LI>C: unions of structures
</UL>

<P>
<!-- NEW -->
</P>

<H2>Abstract syntax in Haskell</H2>

<P>
This is the most straightforward, so we start from it.
</P>
<P>
For every type in the grammar, a <CODE>data</CODE> definition is produced:
</P>

<PRE>
    data Exp =
       EAdd Exp Exp
     | ESub Exp Exp
     | EMul Exp Exp
     | EDiv Exp Exp
     | EInt Integer
      deriving (Eq,Ord,Show)
</PRE>

<P>
The <CODE>deriving</CODE> part says that the type <CODE>Exp</CODE> has equality and order
predicates, and its objects can be shown as strings.
</P>
<P>
The complete code is in the file
<A HREF="calc/haskell/AbsCalc.hs"><CODE>AbsCalc.hs</CODE></A>.
</P>
<P>
<!-- NEW -->
</P>

<H2>An interpreter in Haskell: the tree traversal</H2>

<P>
We write a program that parses an arithmetic expression and returns a
numeric value.
</P>
<P>
Here is the tree traversal: pattern matching on the type <CODE>Exp</CODE>.
</P>

<PRE>
    module Interpreter where
  
    import AbsCalc
  
    interpret :: Exp -&gt; Integer
    interpret x = case x of
      EAdd exp0 exp  -&gt; interpret exp0 + interpret exp
      ESub exp0 exp  -&gt; interpret exp0 - interpret exp
      EMul exp0 exp  -&gt; interpret exp0 * interpret exp
      EDiv exp0 exp  -&gt; interpret exp0 `div` interpret exp
      EInt n  -&gt; n
</PRE>

<P>
The complete code is in the file
<A HREF="calc/haskell/Interpreter.hs"><CODE>Interpreter.hs</CODE></A>.
</P>
<P>
<!-- NEW -->
</P>

<H2>An interpreter in Haskell: the main function</H2>

<P>
We write a module reading string input calling <CODE>Interpreter.interpret</CODE>.
</P>
<P>
The string is first lexed and parsed. The file
<A HREF="calc/haskell/Interpret.hs"><CODE>Interpret.hs</CODE></A>
is modified from
<A HREF="calc/haskell/TestCalc.hs"><CODE>TestCalc.hs</CODE></A>.
</P>

<PRE>
    module Main where
  
    import LexCalc
    import ParCalc
    import AbsCalc
    import Interpreter
  
    import ErrM
  
    main = do
      interact calc
      putStrLn ""
  
    calc s =
      let Ok e = pExp (myLexer s)
      in show (interpret e)
</PRE>

<P>
<!-- NEW -->
</P>

<H2>Skeleton for tree processing in Haskell</H2>

<P>
Recursive functions making case analysis on trees is used in almost all
components of the compiler.
</P>
<P>
BNFC gives a template for this, the <B>skeleton</B> file
<A HREF="calc/haskell/SkelCalc.hs"><CODE>SkelCalc.hs</CODE></A>,
with a dummy tree processing function:
</P>

<PRE>
    transExp :: Exp -&gt; Result
    transExp x = case x of
      EAdd exp0 exp  -&gt; failure x
      ESub exp0 exp  -&gt; failure x
      EMul exp0 exp  -&gt; failure x
      EDiv exp0 exp  -&gt; failure x
      EInt n  -&gt; failure x
  
    type Result = Err String
  
    failure :: Show a =&gt; a -&gt; Result
    failure x = Bad $ "Undefined case: " ++ show x
</PRE>

<P>
<!-- NEW -->
</P>

<H2>Compiling and running the interpreter in Haskell</H2>

<P>
Compile with GHC:
</P>

<PRE>
    ghc --make Interpret.hs
</PRE>

<P>
Run on command-line input:
</P>

<PRE>
    echo "1 + 2 * 3" | ./Interpret
    7
</PRE>

<P>
Run on file input (<A HREF="calc/ex1.calc"><CODE>ex1.calc</CODE></A>):
</P>

<PRE>
    ./Interpret &lt; ex1.calc
    9102
</PRE>

<P>
Now, if you are not working in Java, C, or C++, you can take a long leap.
</P>
<P>
<!-- NEW -->
</P>

<H2>Abstract syntax in Java</H2>

<P>
For each category in the grammar, an abstract base class is generated.
</P>
<P>
For each constructor of the category, a class extending the base class.
</P>
<P>
This means quite a few files, put into the subdirectory
<A HREF="calc/java/Calc/Absyn/"><CODE>Calc/Absyn/</CODE></A>:
</P>

<PRE>
    Calc/Absyn/EAdd.java
    Calc/Absyn/EDiv.java
    Calc/Absyn/EInt.java
    Calc/Absyn/EMul.java
    Calc/Absyn/ESub.java
    Calc/Absyn/Exp.java
</PRE>

<P>
<!-- NEW -->
</P>

<H2>Inside the abstract syntax Java classes</H2>

<PRE>
    public abstract class Exp implements java.io.Serializable {
  
    // some code that we explain later
    }
  
    public class EAdd extends Exp {
      public final Exp exp_1, exp_2;
      public EAdd(Exp p1, Exp p2) { exp_1 = p1; exp_2 = p2; }
  
    // some more code that we explain later
    }
  
    /* the same for ESub, EMul, EDiv */
  
    public class EInt extends Exp {
      public final Integer integer_;
      public EInt(Integer p1) { integer_ = p1; }
    }
</PRE>

<P>
<!-- NEW -->
</P>

<H2>Tree processing in Java: the Visitor</H2>

<P>
For each category in the grammar, a Visitor interface and
an abstract accept method are generated.
</P>
<P>
For each constructor of the category, an accept method overriding the
abstract accept.
</P>

<PRE>
    public abstract class Exp implements java.io.Serializable {
      public abstract &lt;R,A&gt; R accept(Exp.Visitor&lt;R,A&gt; v, A arg);
      public interface Visitor &lt;R,A&gt; {
        public R visit(Calc.Absyn.EAdd p, A arg);
        public R visit(Calc.Absyn.ESub p, A arg);
        public R visit(Calc.Absyn.EMul p, A arg);
        public R visit(Calc.Absyn.EDiv p, A arg);
        public R visit(Calc.Absyn.EInt p, A arg);
      }
    }
  
    public class EAdd extends Exp {
      public final Exp exp_1, exp_2;
      public EAdd(Exp p1, Exp p2) { exp_1 = p1; exp_2 = p2; }
  
      public &lt;R,A&gt; R accept(Calc.Absyn.Exp.Visitor&lt;R,A&gt; v, A arg)
      {
        return v.visit(this, arg);
      }
</PRE>

<P>
<!-- NEW -->
</P>

<H2>The meaning of the Java Visitor</H2>

<P>
<CODE>interface Exp.Visitor &lt;R,A&gt;</CODE>: collection of methods for
</P>

<UL>
<LI>visiting trees of type <CODE>Exp</CODE>
<LI>returning a value of type <CODE>R</CODE>
<LI>using an extra argument of type <CODE>A</CODE>
</UL>

<P>
<CODE>R accept(Exp.Visitor&lt;R,A&gt; v, A arg)</CODE>: a method for using a visitor and
returning a value
</P>
<P>
Recursive tree traversal:
</P>

<UL>
<LI>write a class that implements <CODE>Exp.Visitor</CODE>
<LI>override the default <CODE>visit</CODE> methods
</UL>

<P>
<!-- NEW -->
</P>

<H2>An interpreter in Java: tree processing</H2>

<P>
The program uses the visitor skeleton as basis, with <CODE>Integer</CODE> as return type
and <CODE>Object</CODE> as argument type (a dummy argument):
</P>

<PRE>
    private static class Calculator implements Exp.Visitor&lt;Integer,Object&gt; {
  
      public Integer visit(Calc.Absyn.EAdd p,Object o)
      {
        Integer a = p.exp_1.accept(this, null);
        Integer b = p.exp_2.accept(this, null);
        return a + b;
      }
  
      /* correspondingly for ESub, EMul, EDiv */
  
      public Integer visit(Calc.Absyn.EInt p, Object o)
      {
        return p.integer_;
      }
    }
</PRE>

<P>
<!-- NEW -->
</P>

<H2>Java code around tree processing</H2>

<P>
The complete code is in the file
<A HREF="calc/java/Calc/Interpreter.java"><CODE>Interpreter.java</CODE></A>:
</P>

<PRE>
    package Calc;
    import Calc.Absyn.*;
  
    public class Interpreter {
  
      public static Integer interpret(Exp e)
      {
  	Exp exp = (Exp)e ;
  	return interpretExp(exp, null) ;
      }
  
      private static Integer interpretExp(Exp e, Object o)
      {
  	return e.accept(new Calculator(), null) ;
      }
  
      private static class Calculator implements Exp.Visitor&lt;Integer,Object&gt; {
  
      // the tree traversal: see previous section
      }
    }
</PRE>

<P>
<!-- NEW -->
</P>

<H2>Tree processing in Java: the skeleton</H2>

<P>
You can use the BNFC-generated file
<A HREF="calc/java/Calc/VisitSkel.java"><CODE>Calc/VisitSkel.java</CODE></A> as a template for
writing your own visitor applications:
</P>

<PRE>
    public class VisitSkel {
      public class ExpVisitor&lt;R,A&gt; implements Exp.Visitor&lt;R,A&gt; {
        public R visit(Calc.Absyn.EAdd p, A arg)
        {
          /* Code For EAdd Goes Here */
          p.exp_1.accept(new ExpVisitor&lt;R,A&gt;(), arg);
          p.exp_2.accept(new ExpVisitor&lt;R,A&gt;(), arg);
          return null;
        }
  
        public R visit(Calc.Absyn.EInt p, A arg)
        {
          /* Code For EInt Goes Here */
          //p.integer_;
          return null;
        }
      }
    }
</PRE>

<P>
<!-- NEW -->
</P>

<H2>An interpreter in Java: main class</H2>

<P>
The program
<A HREF="calc/java/Calc/Interpret.java"><CODE>Interpret.java</CODE></A>
modifies the BNFC-generated
<A HREF="calc/java/Calc/Test.java"><CODE>Test.java</CODE></A>,
by changing the tree output
to a call of the interpreter:
</P>

<PRE>
    public class Interpret {
      public static void main(String args[]) throws Exception
      {
        Yylex l  = new Yylex(System.in) ;
        parser p = new parser(l, l.getSymbolFactory()) ;
        Calc.Absyn.Exp parse_tree = p.pExp() ;
        System.out.println(Interpreter.interpret(parse_tree)) ;
      }
    }
</PRE>

<P>
(We have omitted error handling.)
</P>
<P>
<!-- NEW -->
</P>

<H2>Compiling and running the interpreter in Java</H2>

<P>
Compile with javac:
</P>

<PRE>
    javac Calc/Interpret.java
</PRE>

<P>
Run on command-line input:
</P>

<PRE>
    echo "1 + 2 * 3" | java Calc/Interpret
    7
</PRE>

<P>
Run on file input:
</P>

<PRE>
    java Calc/Interpret &lt; ex1.calc
    9102
</PRE>

<P>
<!-- NEW -->
</P>

<H2>Abstract syntax in C</H2>

<P>
Simpler than in Java but more complex than in Haskell.
</P>
<P>
For every type in the grammar, a structure containing a
tagged union of structures is created:
</P>

<PRE>
    struct Exp_ {
      enum { is_EAdd, is_ESub, is_EMul, is_EDiv, is_EInt } kind;
      union {
        struct { Exp exp_1, exp_2; } eadd_;
        struct { Exp exp_1, exp_2; } esub_;
        struct { Exp exp_1, exp_2; } emul_;
        struct { Exp exp_1, exp_2; } ediv_;
        struct { Integer integer_; } eint_;
      } u;
    };
  
    typedef struct Exp_ *Exp ;
  
    Exp make_EAdd(Exp p0, Exp p1);
    Exp make_ESub(Exp p0, Exp p1);
    Exp make_EMul(Exp p0, Exp p1);
    Exp make_EDiv(Exp p0, Exp p1);
    Exp make_EInt(Integer p0);
</PRE>

<P>
The complete code is in the files
<A HREF="calc/c/Absyn.h"><CODE>Absyn.h</CODE></A> and
<A HREF="calc/c/Absyn.c"><CODE>Absyn.c</CODE></A>.
</P>
<P>
<!-- NEW -->
</P>

<H2>An interpreter in C: tree traversal</H2>

<P>
We modify the skeleton (next section),
changing the return type and the name of <CODE>visitExp</CODE>:
</P>

<PRE>
    int interpret(Exp _p_)
    {
      switch(_p_-&gt;kind)
      {
      case is_EAdd:
        return interpret(_p_-&gt;u.eadd_.exp_1) + interpret(_p_-&gt;u.eadd_.exp_2) ;
      case is_ESub:
        return interpret(_p_-&gt;u.eadd_.exp_1) - interpret(_p_-&gt;u.eadd_.exp_2) ;
      case is_EMul:
        return interpret(_p_-&gt;u.eadd_.exp_1) * interpret(_p_-&gt;u.eadd_.exp_2) ;
      case is_EDiv:
        return interpret(_p_-&gt;u.eadd_.exp_1) / interpret(_p_-&gt;u.eadd_.exp_2) ;
      case is_EInt:
        return _p_-&gt;u.eint_.integer_ ;
      }
    }
</PRE>

<P>
The complete code is in the files
<A HREF="calc/c/Interpreter.h"><CODE>Interpreter.h</CODE></A> and
<A HREF="calc/c/Interpreter.c"><CODE>Interpreter.c</CODE></A>.
</P>
<P>
<!-- NEW -->
</P>

<H2>Tree processing skeleton in C</H2>

<P>
The skeleton defines <CODE>visit</CODE> functions for all categories using
case analysis over constructor tags.
</P>

<PRE>
    void visitInteger(Integer i);
  
    void visitExp(Exp _p_)
    {
      switch(_p_-&gt;kind)
      {
      case is_EAdd:
        /* Code for EAdd Goes Here */
        visitExp(_p_-&gt;u.eadd_.exp_1);
        visitExp(_p_-&gt;u.eadd_.exp_2);
        break;
  
      /* similarly for other binary ops */
  
      case is_EInt:
        /* Code for EInt Goes Here */
        visitInteger(_p_-&gt;u.eint_.integer_);
        break;
      }
    }
</PRE>

<P>
The complete code is in the files
<A HREF="calc/c/Skeleton.h"><CODE>Skeleton.h</CODE></A> and
<A HREF="calc/c/Skeleton.c"><CODE>Skeleton.c</CODE></A>.
</P>
<P>
<!-- NEW -->
</P>

<H2>An interpreter in C: main function</H2>

<P>
The string is first lexed and parsed.
</P>

<PRE>
    #include &lt;stdio.h&gt;
    #include &lt;stdlib.h&gt;
    #include "Parser.h"
    #include "Printer.h"
    #include "Absyn.h"
    #include "Interpreter.h"
  
    int main(int argc, char ** argv)
    {
      FILE *input;
      input = stdin;
      Exp parse_tree = pExp(input);
      if (parse_tree)
      {
        printf("%d\n", interpret(parse_tree));
        return 0;
      }
      return 1;
    }
</PRE>

<P>
The complete code is in the file
<A HREF="calc/c/Interpret.c"><CODE>Interpret.c</CODE></A>,
modified from the BNFC-generated
<A HREF="calc/c/Test.c"><CODE>Test.c</CODE></A>.
</P>
<P>
<!-- NEW -->
</P>

<H2>Compiling and running the interpreter in C</H2>

<P>
Compile with GCC:
</P>

<PRE>
    gcc -c Interpreter.c Interpret.c
    gcc *.o -o interpret
</PRE>

<P>
Run on command-line input:
</P>

<PRE>
    echo "1 + 2 * 3" | ./interpret
    7
</PRE>

<P>
Run on file input:
</P>

<PRE>
    ./interpret &lt; ex1.calc
    9102
</PRE>

<P>
<!-- NEW -->
</P>

<H2>A larger source language</H2>

<P>
We want to build a grammar of "C--", a fragment of C sufficient for
parsing the following Fibonacci program:
</P>

<PRE>
    // a fibonacci function showing most features of the CMM language
  
    int mx ()
    {
      return 5000000 ;
    }
  
    int main ()
    {
      int lo ;
      int hi ;
      lo = 1 ;
      hi = lo ;
      printf("%d",lo) ;
      while (hi &lt; mx()) {
        printf("%d",hi) ;
        hi = lo + hi ;
        lo = hi - lo ;
      }
    return 0 ;
    }
</PRE>

<P>
<!-- NEW -->
</P>

<H2>List categories</H2>

<P>
<B>Lists</B> are used everywhere in grammars. BNFC the category symbol
<CODE>[C]</CODE> for a list of <CODE>C</CODE>s.
</P>
<P>
Thus a program is a list of functions:
</P>

<PRE>
    Prog. Program ::= [Function] ;
</PRE>

<P>
A function has a list of declarations and a list of statements:
</P>

<PRE>
    Fun. Function ::= Type Ident "(" [Decl] ")" "{" [Stm] "}" ;
</PRE>

<P>
<!-- NEW -->
</P>

<H2>Terminators and separators</H2>

<P>
Lists have <B>terminators</B> and <B>separators</B>:
</P>

<PRE>
    terminator Function "" ;
    terminator Stm "" ;
    separator Decl "," ;
</PRE>

<P>
The empty terminator <CODE>""</CODE> means the elements are just written next to
each other.
</P>
<P>
A list can be forced to have at least one element:
</P>

<PRE>
    separator nonempty Ident "," ;
</PRE>

<P>
<!-- NEW -->
</P>

<H2>Abstract syntax for lists</H2>

<P>
Haskell: <CODE>[C]</CODE> is translated as <CODE>[C]</CODE>
</P>
<P>
Java 1.5: <CODE>[C]</CODE> is translated as <CODE>java.util.LinkedList&lt;C&gt;</CODE>
</P>
<P>
C++ with STL: <CODE>[C]</CODE> is translated as <CODE>vector&lt;C*&gt;</CODE>
</P>
<P>
C: <CODE>[C]</CODE> is translated as a <CODE>struct</CODE> implementing the linked list
of <CODE>C</CODE>.
</P>
<P>
<!-- NEW -->
</P>

<H2>Comments</H2>

<P>
Comments are parts of source codes that the compiler ignores.
</P>
<P>
BNFC permits the definition of two kinds of comments: one-line and enclosed.
</P>
<P>
They are defined in the following ways for C--:
</P>

<PRE>
    comment "//" ;
    comment "/*" "*/" ;
</PRE>

<P>
Thus one-line comment definitions define the beginning of a comment, and
enclosed comment definitions the beginning and the end.
</P>
<P>
Commends are resolved by the lexer, using a finite automaton. Therefore
nested comments are not possible.
</P>
<P>
<!-- NEW -->
</P>

<H2>Complete grammar for C--</H2>

<PRE>
  comment "//" ;
  comment "/*" "*/" ;
  
  Prog. Program  ::= [Function] ;
  Fun.  Function ::= Type Ident "(" [Decl] ")" "{" [Stm] "}" ;
  Dec.  Decl     ::= Type [Ident] ;
  
  terminator Function "" ;
  terminator Stm "" ;
  separator  Decl "," ;
  separator  nonempty Ident "," ;
  
  SDecl.   Stm ::= Decl ";"  ;
  SExp.    Stm ::= Exp ";" ;
  SBlock.  Stm ::= "{" [Stm] "}" ;
  SWhile.  Stm ::= "while" "(" Exp ")" Stm ;
  SReturn. Stm ::= "return" Exp  ";" ;
  
  EAss.    Exp  ::= Ident "=" Exp ;
  ELt.     Exp1 ::= Exp2 "&lt;" Exp2 ;
  EAdd.    Exp2 ::= Exp2 "+" Exp3 ;
  ESub.    Exp2 ::= Exp2 "-" Exp3 ;
  EMul.    Exp3 ::= Exp3 "*" Exp4 ;
  Call.    Exp4 ::= Ident "(" [Exp] ")" ;
  EVar.    Exp4 ::= Ident ;
  EStr.    Exp4 ::= String ;
  EInt.    Exp4 ::= Integer ;
  EDouble. Exp4 ::= Double ;
  
  coercions Exp 4 ;
  
  separator Exp "," ;
  
  TInt.    Type ::= "int" ;
  TDouble. Type ::= "double" ;
</PRE>

<P>
<!-- NEW -->
</P>

<H2>Built-in token types</H2>

<P>
These types are hard-coded and cannot be value types of rules:
</P>

<UL>
<LI><CODE>Integer</CODE>: sequence of digits, e.g. <CODE>123445425425436</CODE>
<LI><CODE>Double</CODE>: two sequences of digits with a decimal point in between,
  and an optional exponent, e.g. <CODE>7.098e-7</CODE>
<LI><CODE>String</CODE>: any characters between double quotes, e.g. <CODE>"hello world"</CODE>
<LI><CODE>Char</CODE>: any character between single quotes, e.g. <CODE>'7'</CODE>
<LI><CODE>Ident</CODE>: a letter followed by letters, digits, and characters <CODE>-'</CODE>, e.g.
  <CODE>r2_out'</CODE>
</UL>

<P>
Precise definitions are given in the LBNF report.
</P>
<P>
<!-- NEW -->
</P>

<H2>Token definitions</H2>

<P>
A grammar can define new token types by using regular expressions.
Here is an example of the format:
</P>

<PRE>
    token CIdent (letter (letter | digit | '_')*) ;
</PRE>

<P>
See LBNF report for more information.
</P>
<P>
<!-- NEW -->
</P>

<H2>Remembering the position of a token</H2>

<P>
The position of a token can be passed to the syntax tree:
</P>

<PRE>
    position token CIdent (letter (letter | digit | '_')*) ;
</PRE>

<P>
See LBNF report for more information.
</P>
<P>
<!-- NEW -->
</P>

<H2>Common problems</H2>

<P>
Java: CLASSPATH, should contain:
</P>

<UL>
<LI>the current directory <CODE>.</CODE>
<LI>Cup directory
<LI>JLex directory
<P></P>
C and C++: Bison version, should be 1.875.
<P></P>
All grammars: conflicts. Use parser diagnosis tools to find and solve them!
</UL>

<P>
<!-- NEW -->
</P>

<H2>Further reading</H2>

<P>
BNFC homepage:
<A HREF="http://digitalgrammars.com/bnfc"><CODE>digitalgrammars.com/bnfc</CODE></A>
</P>
<P>
The LBNF report, covering the whole Labelled BNF grammar language used in BNFC:
<A HREF="http://digitalgrammars.com/bnfc/doc/LBNF-report.pdf"><CODE>digitalgrammars.com/bnfc/doc/LBNF-report.pdf</CODE></A>
</P>
<P>
Chalmers Programming Languages course notes:
<A HREF="http://www.cs.chalmers.se/Cs/Grundutb/Kurser/progs/current/"><CODE>www.cs.chalmers.se/Cs/Grundutb/Kurser/progs/current/</CODE></A>
</P>
<P>
Appel's book series <I>Modern Compiler Implementation in ML/C/Java</I>.
BNFC generates abstract syntax representations as used in these books.
</P>
<P>
Dragon book: Aho, Lam, Sethi, Ullman,
<I>Compilers Principles, Techniques and Tools</I>, 2007.
</P>

<!-- html code generated by txt2tags 2.6 (http://txt2tags.org) -->
<!-- cmdline: txt2tags -thtml bnfc-tutorial.txt -->
</BODY></HTML>
